Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics

نویسندگان

  • Yuji Oshima
  • Shinnosuke Takamichi
  • Tomoki Toda
  • Graham Neubig
  • Sakriani Sakti
  • Satoshi Nakamura
چکیده

This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Cross-lingual speech synthesis based on voice conversion or HMM-based speech synthesis, which synthesizes foreign language speech of a specific non-native speaker reflecting the speaker-dependent acoustic characteristics extracted from the speaker’s natural speech in his/her mother tongue, tends to cause a degradation of speaker individuality in synthetic speech compared to intra-lingual speech synthesis. This paper proposes a new approach to cross-lingual speech synthesis that preserves speaker individuality by explicitly using non-native speech spoken by the target speaker. Although the use of nonnative speech makes it possible to preserve the speaker individuality in the synthesized target speech, naturalness is significantly degraded as the speech is directly affected by unnatural prosody and pronunciation often caused by differences in the linguistic systems of the source and target languages. To improve naturalness while preserving speaker individuality, we propose (1) a prosodic correction method based on model adaptation, and (2) a phonetic correction method based on spectrum replacement for unvoiced consonants. The experimental results demonstrate that these proposed methods are capable of significantly improving naturalness while preserving the speaker individuality in synthetic speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics

This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Crosslingual speech synthesis based on voice conversion or Hidden Markov Model (HMM)-based speech synthesis is a technique to synthesize foreign language speech using a target speaker’s natural speech uttered in his/her mother tongue. Although the technique holds promise t...

متن کامل

Unit Selection Speech Synthesis Using Phonetic-Prosodic Description of Speech Databases

This paper describes an approach to speech synthesis based on using speech databases at different stages of TTS process. Speech database units are phones in different segmental and prosodic contexts. Pitch synchronous segmentation and labeling of databases allows storing both segmental and prosodic information. Phonetic-prosodic annotations of speech databases are involved in off-line training ...

متن کامل

Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities

Automatic evaluation of non-native speech accentedness has potential implications for not only language learning and accent identification systems but also for speaker and speech recognition systems. From the perspective of speech production, the two primary factors influencing the accentedness are the phonetic and prosodic structure. In this paper, we propose an approach for automatic accented...

متن کامل

Voice morphing and the manipulation of intra-speaker and cross-speaker phonetic variation to create foreign accent continua: a perceptual study

The STRAIGHT system of voice morphing was used to create voice continua of (Korean) accented Australian English, intended to simulate phonetic variation ranging from ‘heavily accented’ to ‘unaccented’ (native-like) Australian English, employing dimensions of intra-speaker and cross-speaker variation to yield a range of synthetic voices. These synthetic voices were evaluated against actual sampl...

متن کامل

A corpus-based analysis of transfer effects and connected speech processes in Vietnamese English

This paper presents a corpus-based descriptive analysis of the most prevalent transfer effects and connected speech processes observed in a comparison of 11 Vietnamese English speakers (6 females, 5 males) and 12 Australian English speakers (6 males, 6 females) over 24 grammatical paraphrase items. The phonetic processes are segmentally labelled in terms of IPA diacritic features using the EMU ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015